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08/942,168, entitled, "Method For Automatically Reporting 
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APPENDICES 

Appendix A, which forms a part of this disclosure, is a list 
of commonly owned copending U.S. patent applications. 
Each one of the applications listed in Appendix A is hereby 
incorporated herein in its entirety by reference thereto. 

COPYRIGHT RIGHTS 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent 
disclosure, as it appears in the Patent and Trademark Office 
patent files or records, but otherwise reserves all copyright 
rights whatsoever. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The invention relates to the reporting of problems and/or 
failure conditions in electronic systems. More particularly, 
the invention relates to a system and method for automati- 
cally reporting failure conditions in a server system. 

2. Description of the Related Technology 

In the computer industry, the fast and efficient detection of 
system errors and/or failures, and the subsequent correction 
of such failures, is critical to providing quality performance 
and product reliability to the users and buyers of computer 
systems. Particularly with respect to server computers which 
are accessed and utilized by many end users, early detection 
and notification of system problems and failures is an 
extremely desirable performance characteristic, especially 
for users who depend on the server to obtain data and 
information in their daily business operations, for example. 

Typically, after a server has failed, users trying to access 
that server do not know that a problem exists or what the 
nature of the problem is. If a user experiences undue delay 
in connecting to the server or accessing a database through 
the server, the user typically does not know whether there is 
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something wrong with the server, something wrong with his 
or her connection line, or whether both problems exist. In 
this scenario, the user must wait for a system operator, at the 
site where the server is located, to detect the error or failure 
5 and correct it. Hours can elapse before the failure is cor- 
rected. Often, a system operator or administrator will not 
discover the failure until users experience problems and start 
complaining. In the meantime, an important event may be 
missed and time is wasted, leading to user dissatisfaction 
10 with the server system. 

Therefore, what is needed is a method and system for 
early detection of system failures or problems and prompt 
notification to a system operator or control center of the 
failure condition so that remedial actions may be quickly 
taken. In addition, for servers which may be remotely 
located from a control center, for example, a method and 
system for notifying the control center at a remote location 
is needed. 

20 SUMMARY OF THE INVENTION 

The invention addresses the above and other needs by 
providing a method and system for detecting a system 
failure and automatically reporting the failure to a system 
operator who may be located at or near the site where the 
server is present, or remotely located from the server such 
that the system operator communicates with the server via a 
modem connection. As used herein, the terms "failure", 
"system failure", "system failure condition" and any com- 
bination or conjugation of these terms refers to any problem, 
error, fault, or out of tolerance operating condition or 
parameter which may be detected in a computer and/or 
server system. Additionally, these terms may refer to a 
change in a status or condition of the server system, or a 
component or subsystem thereof. 

In one embodiment of the invention, a system for report- 
ing a failure condition in a server system, includes: a 
controller which monitors the server system for system 
failures, and generates an event signal and failure informa- 
40 tion if a system failure is detected; a system interface, 
coupled to the controller, which receives the event signal; a 
central processing unit, coupled to the system interface, 
wherein, upon receiving the event signal, die system inter- 
face reports an occurrence of an event to the central pro- 
45 cessing unit; and a system log which receives failure infor- 
mation communicated from the system interface and stores 
said failure information. 

In another embodiment, the system described above fur- 
ther includes a system recorder, coupled between the con- 
50 trailer and the system log, for receiving the failure infor- 
mation from the controller, assigning a time value to the 
failure information, and subsequently storing the failure 
information with the time value into die system log. 

In another embodiment, a failure reporting system for a 
55 server system, includes the following: a controller which 
monitors the server system for system failures and generates 
an event signal and failure information if a system failure is 
detected; a system recorder, coupled to the controller, which 
receives failure information and assigns a time value to the 
60 failure information; a system log which stores failure infor- 
mation received from the system recorder; and a system 
interface, coupled to the controller, which receives and 
stores the event signal, and reports an occurrence of an event 
to a central processing unit which is coupled to the system 
65 interface, wherein the central processing unit executes a 
software program which allows a system operator to access 
the system log to read failure information stored therein. 
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In a further embodiment, the system described above 12 
further includes a remote interface, coupled to the controller, 
which receives the event signal and reports the occurrence of 
an event to a computer external to the server system. 

In yet another embodiment, a failure reporting system for 
a server system, includes: a controller which monitors the 
server system for system failures and generates an event 
signal and failure information if a system failure is detected; 
a system recorder, coupled to the controller, which receives 
the failure information and assigns a date and time to the 
failure information; a system log which stores the failure 
information; a system interface, coupled to the controller, 
which receives and stores the event signal and reports an 
occurrence of an event to a central processing unit, coupled 
to the system interface, wherein the central processing unit 
executes a software program which allows a system operator 
to access the system log to read failure information stored 
therein; a remote interface, coupled to the controller, which 
receives the event signal and reports the occurrence of an 
event to a computer external to the server system; and a 
switch, coupled to the remote interface, which switches 
connectivity to the remote interface between a first computer 
and a second computer, wherein the first computer is a local 
computer, coupled to the switch via a local communications 
line, and the second computer is a remote computer, coupled 
to the switch via a modem connection. 

In a further embodiment, a failure reporting system in a 
server system, includes: means for detecting a system failure 
condition; means for transmitting failure information related 
to the failure condition to a system recorder; means for 
storing the failure information; and means for reporting an 
occurrence of an event to a central processing unit of the 
server system. 

In another embodiment, the invention is a program stor- 
age device which stores instructions that when executed by 
a computer perform a method, wherein the method com- 
prises: detecting a system failure condition; transmitting 
failure information related to the failure condition to a 
system recorder; storing the failure information in a system 
log; and reporting an occurrence of an event to a central 
processing unit of the server system. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of a server having a failure 
reporting system for detecting, recording and reporting a 
system failure in accordance with one embodiment of the 
invention. 

FIG. 2 is a system block diagram of one embodiment of 
a system interface which is used to transfer data between the 
server's operating and the server* failure reporting system, in 
accordance with the invention. 

FIG. 3A is a table illustrating one embodiment of a data 
format for a read request signal communicated by the system 
interface and/or the remote interface of FIG. 1 in accordance 
with the invention. 

FIG. 3B is a table illustrating one embodiment of a data 
format for a write request signal communicated by the 
system interface and/or the remote interface of FIG. 1 in 
accordance with the invention. 

FIG. 3C is a table illustrating one embodiment of a data 
format for a read response signal communicated by the 
system interface and/or the remote interface of FIG. 1 in 
accordance with the invention. 

FIG. 3D is a table illustrating one embodiment of a data 
format for a write response signal communicated by the 
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system interface and/or the remote interface of FIG. 1 in 
accordance with the invention. 

FIG. 4 is a system block diagram of one embodiment of 
the remote interface of FIG. 1. 

FIGS. 5A-5C illustrate one embodiment of a data format 
for a request, a response, and an interrupt signal, 
respectively, which are received and transmitted by the 
remote interface of FIG. 1. 

FIG. 6 is a system block diagram of one embodiment of 
the system recorder of FIG. 1. 

FIGS. 7A-7D together form a flowchart diagram of one 
embodiment of a process of storing information in the 
system log and retrieving information from the system log. 

FIGS. 8A-8D together form a flowchart illustrating one 
embodiment of a process for detecting and reporting system 
failures in accordance with the invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The invention is described in detail below with reference 
to the figures, wherein like elements are referenced with like 
numerals throughout. 

Referring to FIG. 1, a block diagram of one embodiment 
of a server system 100 is illustrated. The server system 100 
includes a central processing unit (CPU) 101 which executes 
the operating system (OS) software, which controls the 
communications protocol of the server system 100. The 
CPU 101 is coupled to an Industry Standard Architecture 
bus (ISA bus) 103 which transfers data to and from the CPU 
101. The ISA bus 103 and its functionality are well-known 
in the art. Coupled to the ISA bus 103 is a system interface 
105 which receives event signals from one or more micro- 
controllers that monitor and control various subsystems and 
components of the server system 100. As described in 
further detail below, an event signal sent to the system 
interface 105 indicates that a system failure or error has 
occurred. The various microcontrollers which monitor the 
server system 100 are also described in further detail below: 
As used herein, the term "event" may refer to the occurrence 
of any type of system failure. The structure and functionality 
of the system interface 105 is described in greater detail 
below with respect to FIG. 2. Additionally, as used herein 
the terms "signal," "command" and "data" and any conju- 
gation and combinantions thereof, are used synonymously 
and interchangeably and refer to any information or value 
that may be transmitted, received or communicated between 
two electronic entities. 

Coupled to the system interface 105 is a system bus 107. 
In one embodiment, the system bus 107 is an Inter-IC 
control bus (I 2 C bus), which transfers data to and from the 
various controllers and subsystems mentioned above. The 
I 2 C bus and the addressing protocol in which data is trans- 
ferred across the bus are well-known in the art. One embodi- 
ment of a messaging protocol used in this I 2 C bus architec- 
ture is discussed in further detail below with reference to 
FIGS. 3A-3D. The command, diagnostic, monitoring, and 
logging functions of the failure reporting system of the 
invention are accessed through the common I 2 C bus proto- 
col. In one embodiment, the I 2 C bus protocol uses addresses 
typically stored in a first byte of a data stream, as the means 
of identifying the various devices and commands to those 
devices. Any function can be queried by generating a "read" 
request, which has its address as part of its protocol format. 
Conversely, a function can be executed by ''writing" to an 
address specified in the protocol format. Any controller or 
processor connected to die bus can initiate read and write 
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requests by sending a message on the I 2 C bus to the 
processor responsible for that function. 

Coupled to the system bus 107 is a CPU A controller 109, 
a CPU B controller 111, a chassis controller 112 and four 
canister controllers 113. These controllers monitor and con- 
trol various operating parameters and/or conditions of the 
subsystems and components of the server system 100. For 
example, CPU A controller 109 may monitor the system fan 
speeds, CPU B controller 111 may monitor the operating 
temperature of the CPU 101, the chassis controller 112 may 
monitor the presence of various circuit boards and compo- 
nents of the server system, and each of the canister control- 
lers 112 may monitor the presence and other operating 
conditions of "canisters" connected to the server system 
100. A "canister" is a detachable module which provides 
expandability to the number of peripheral component inter- 
face (PCI) devices that may be integrated into the server 
system 100. In one embodiment, each canister is capable of 
providing I/O slots for up to four PCI cards, each capable of 
controlling and arbitrating access to a PCI device, such as a 
CD ROM disk drive, for example. A more detailed descrip- 
tion of a canister can be found in a co-pending and com- 
monly owned patent application entitled, "Network Server 
With Network Interface, Data Storage and Power Modules 
That May Be Removed and Replaced Without Powering 
Down the Network", which is listed in Appendix A attached 
hereto. 

If one or more of the various controllers detects a failure, 
the respective controller sends an event signal to the system 
interface 105 which subsequently reports the occurrence of 
the event to the CPU 101. In one embodiment, the control- 
lers 109, 111 and 113 are PIC16C65 microcontroller chips 
manufactured by Microchip Technologies, Inc. and the chas- 
sis controller 112 is a PIC16C74 microcontroller chip manu- 
factured by Microchip Technologies, Inc. 

Upon detecting a failure condition, a controller (109, 111, 
112 or 113) not only transmits an event signal to the system 
interface 105, but also transmits failure information associ- 
ated with the failure condition to a system recorder 115 
connected to the system bus 107. The system recorder 115 
then assigns a time stamp to the failure information and logs 
the failure by storing the failure information, along with its 
time stamp, into a system log 117. The operation and 
functionality of the system recorder 115 is described in 
further detail below with reference to FIG. 6. In one 
embodiment, the system log 117 is a non-volatile random 
access memory (NVRAM), which is well-known for its 
characteristics in maintaining the integrity of data stored 
within it, even when power to the memory cells is cut off for 
extended periods of time as a result of a system shut-down 
or power failure. The following are examples of various 
monitoring functions performed by some of the controllers 
described above. However, it is understood that the inven- 
tion is not limited to these monitoring functions which serve 
only as examples. 

In one embodiment, the controller 109 may be coupled to 
a system fan unit (not shown) and periodically monitor the 
speed of the fan. In one version, the fan unit transmits a pulse 
waveform to the controller 109, the frequency of which is 
proportional to the rate of rotation of the fan. The controller 
109 checks the frequency of the pulse waveform on a 
periodic basis and determines whether the frequency is 
within a specified range of acceptable fan speeds. If a 
measured frequency is either too slow or too fast, the 
controller 109 detects a fan failure condition and sends an 
event signal to the system interface 105. The controller 109 
also sends failure information to the system recorder 115 



which assigns a time value to the failure information and 
stores the failure information with its time stamp into the 
system log 117. After the system interface 105 receives an 
event signal, it reports the occurrence of the event to the 
5 CPU 101. 

As another example, the controller 111 may monitor a 
system temperature parameter. For example, a temperature 
sensor (not shown) may be coupled to the CPU 101 for 
monitoring its operating temperature. In one embodiment, 
10 the temperature sensor generates a voltage which is propor- 
tional to a measured operating temperature of the CPU 101. 
This voltage may then be converted by well-known means 
into a digital data signal and subsequently transmitted to the 
controller 109. The controller 111 then determines whether 
15 the measured temperature falls within specified limits. If the 
measured temperature is either too low or too high, a 
temperature failure condition is detected and an event signal 
is transmitted to the system interface 105 which subse- 
quently reports the event to CPU 101 and an entry is written 

2 0 to the system log 117 by the system recorder 115. 

In another embodiment, multiple temperature sensors (not 
shown) are coupled to a temperature bus (not shown). The 
temperature readings of all the sensors on the temperature 
bus are monitored every second and are read by Dallas Inc. 

25 temperature transducers (not shown) connected to the sys- 
tem bus 107. In one embodiment, the temperature transduc- 
ers are model no. DS1621 digital thermometers, made by 
Dallas Semiconductor Corp. of Dallas, Tex. The temperature 
sensors are read in address order. The criteria for detecting 

30 a temperature fault is provided by two temperature limits: a 
shutdown limit, which is initialized to 70° C.; and lower and 
upper warning limits, which are set at -25° C. and 55° C, 
respectively. Each sensor is compared to the shutdown limit. 
If any temperature exceeds this limit, the system is powered 

35 off. If it is lower than the shutdown limit, each sensor is then 
compared to the warning limits. If any temperature is below 
-25° C. or above 55° C, a warning condition is created, a 
temperature LED is set, a temperature event signal is sent to 
the system interface 105, and an entry is written to the 

40 system log 117 by the system recorder 115. 

The chassis controller 112 can monitor the presence of 
power supplies, for example. In one embodiment, power 
supplies may be detected and identified by a signal line 
coupling each power supply to a one-wire serial bus (not 

45 shown) which is in turn connected to a serial number chip 
(not shown) for identifying the serial number of each power 
supply. In one embodiment, the serial number chip is a 
DS2502 1 Kbit Add-only memory, manufactured by Dallas 
Semiconductor Corp. In order to detect the presence of a 

50 power supply, a trigger pulse may be sent by the chassis 
controller 112 to detect a power supply presence pulse. If 
there is a change in the presence of a power supply, a 
presence bit is updated and a power supply event is sent to 
the system interface 105. The power supply data is then 

55 written to the system log 117. If a power supply is removed 
from the system, no further action takes place. The length of 
the serial number string for that power supply address is set 
to zero. However, if a power supply is installed, its serial 
number is read by the Dallas Semiconductor Corp. one-wire 

60 protocol and written to the system log 117. 

As shown in FIG. 1, the server system 100 further 
includes a remote interface 119 that is also connected to the 
system bus 107. The remote interface 119 also receives 
event signals from the various controllers 109, 111, 112 

65 and/or 113 when a failure condition has been detected. The 
remote interface 119 is a link to the server system 100 for a 
remote client. In one embodiment, the remote interface 119 
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encapsulates messages in a transmission packet to provide 
error-free communications and link security. This method 
establishes a communication protocol in which data is 
transmitted to and from the remote interface 119 by using a 
serial communication protocol known as "byte staffing." In 
this communication method, certain byte values in the data 
stream always have a particular meaning. For example, a 
certain byte value may indicate the start or end of a message, 
an interrupt signal, or any other command. A byte value may 
indicate the type or status of a message, or even be the 
message itself. However, the invention is not limited to any 
particular type of communication protocol and any protocol 
which is suitable may be used by the remote interface 119 in 
accordance with the invention. The remote interface 119 is 
described in further detail below with reference to FIG. 4. 

Through the remote interface 119, a failure condition may 
be reported to a local system operator or to a remote 
operator. As used herein, the term "local" refers to a 
computer, system, operator or user that is not located in the 



retrieve information stored in the system log 117, as 
described above. In one embodiment, this software program 
is the Maestro Central program, manufactured by Netframe, 
Inc. The operating system of the CPU 101 may be an 
5 operating system (OS) driver program, such as Windows 
NT™ or Netware™ for Windows, for example. 

The system interface 105 includes a system interface 
processor 201 which receives event and request signals, 
processes these signals, and transmits command, status and 
10 response signals to the operating system of the CPU 101. In 
one embodiment the system interface processor 201 is a 
PIC16C65 controller chip which includes an event memory 
(not shown) organized as a bit vector, having at least sixteen 
bits. Each bit in the bit vector represents a particular type of 
1 5 event. Writing an event to the system interface processor 201 
sets a bit in the bit vector that represents the event. Upon 
receiving an event signal from the controller 109 (FIG. 1), 
for example, the system interface 105 reports the occurrence 
of an event to the CPU 101 by sending an interrupt to the 



same room as the hardware of the server system 100 but may 20 CPU 101. Upon receiving the interrupt, the CPU 101 will 

be located nearby in a different room of the same building, check the status of the system interface 105 in order to 

for example. The term "remote" refers to a computer, system ascertain that an event is pending. Alternatively, the report- 

or operator that may be located in another city or state, for ing of the occurrence of an event may be implemented by 

example, and is connected to the server system via a programming the CPU 101 to periodically poll the status of 

modem-to-modem connection. The remote operator is typi- 25 the system interface 105 in order to ascertain whether an 

caily a client who is authorized to access data and informa- event is pending. The CPU 101 may then read the bit vector 

tion from the server system 100 through a remote computer in the system interface processor 201 to ascertain the type of 

125- event that occurred and thereafter notify a system operator 

Coupled to the remote interface 119 is a switch 121 for of the event by displaying an event message on a monitor 

switching connectivity to the remote interface 119 between 30 coupled to the CPU 101. After the system operator has been 

a local computer 123 and a remote computer 125. As shown notified of the event, as described above, he or she may then 

in FIG. 1, the local computer 123 is connected to the remote obtain further information about the system failure which 

interface 119 via a local communications line 127. The local generated the event signal by accessing the system log 117. 

communications line 127 may be any type of communica- This capability is also provided by the Maestro Central 

tion line, e.g., an RS232 line, suitable for transmitting data. 35 software program. 



The remote computer 125 is connected to the remote inter- 
face via a modem-to-modem connection established by a 
client modem 129 coupled to a server modem 131. The 
client modem 129 is connected to the server modem 131 by 
a telephone line 133. 

The system interface 105, the system bus 107, the con- 
trollers 109, 111, 112 and 113, the system recorder 115, the 
system log 117, and the remote interface 119 are part of a 
network of controllers and processors which form the failure 



The system interface 105 communicates with the CPU 
101 by receiving request signals from the CPU 101 and 
sending response signals back to the CPU 101. Furthermore, 
the system interface 105 can send and receive status and' 
40 command signals to and from the CPU 101. For example, a 
request signal may be sent from a system operator enquiring 
as to whether the system interface 105 has received any 
event signals, or enquiring as to the status of a particular 
processor, subsystem, operating parameter, etc. A request 
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reporting system of the invention. One embodiment of this 45 signal buffer 203 is coupled to the system interface processor 



failure reporting system is known as the Intrapulse Sys- 
tem™, designed and manufactured by Netframe, Inc., 
located at Milpitas, Calif. In FIG. 1, the Intrapulse System 
is that portion of the components surrounded by the dashed 
lines. The Intrapulse System monitors the status and opera- 50 
tional parameters of the various subsystems of the server 
system 100 and provides system failure and error reports to 
a CPU 101 of the server system 100. Upon reporting the 
occurrence of an event to the CPU 101, the CPU 101 



201 and stores, or queues request signals in the order that 
they are received. Similarly, a response buffer 205 is coupled 
to the system interface processor 201 and queues outgoing 
response signals in the order that they are received. 

A message data register (MDR) 207 is coupled to the 
request and response buffers 203 and 205. In one 
embodiment, the MDR 207 is eight bits wide and has a fixed 
address which may be accessed by the server's operating 
system via the ISA bus 103 coupled to the MDR 207. As 



executes a software program which allows a system operator 55 shown in FIG. 2, the MDR 207 has an I/O address of OCCOh. 



to access further information regarding the system failure 
condition and thereafter take appropriate steps to remedy the 
situation. 

Referring to FIG. 2, a block diagram of one embodiment 
of the system interface 105 is shown surrounded by dashed 60 
lines. The system interface 105 is the interface used by the 
server system 100 to report failure events to the CPU 101. 
Furthermore, a system operator can access failure informa- 
tion related to a detected system failure by means of the r a 

system interface 105. A software program executed by the 65 shows a data format for a write response signal, 
operating system of the CPU 101 allows the CPU 101 to The following is a summary of the data fields shown in 
communicate with the system interface 105 in order to FIGS. 3A-3D: 



When a system operator desires to send a request signal to 
the system interface processor 201, he or she must first 
access the MDR 207 through the operating system of the 
server which knows the address of the MDR 207. 

One embodiment of a data format for the request and 
response signals is illustrated in FIGS. 3A-3D. FIG. 3A 
shows a data format for a read request signal. FIG. 3B shows 
a similar data format for a write request signal. FIG. 3C 
shows a data format for a read response signal and FIG. 3D 
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FIELD 



DESCRIPTION 



Slave Addr 



LSBit 



MSBit 



Type 

Command ID 
(LSB) 

Command ID 
(MSB) 
Length (N) 
Read Request 



Specifies the processor identification code. This field is 
7 bits wide. Bit [7 . . . 1]. 

Specifies what type of activity is taking place. If LSBit 
is clear (0), the master is transmitting to a slave. If 
LSBit is set (1), the master is receiving from a slave. 
Specifies the type of command. It is bit 7 of byte 1 of 
a request. If this bit is clear (0), this is a write 
command. If it is set (1), this is a read command. 
Specifies the data type of this command, such as bit or 
string. 

Specifies the least significant byte of the address of the 
processor. 

Specifies the most significant byte of the address of the 
processor. 



Specifies the length of the data that the master expects to 
get back from a read response. The length, which is in 
bytes, does not include the Status, Check Sum, and 
Inverted Slave Addr fields. 
Read Response Specifies the length of the data immediately following 
this byte, that is byte 2 through byte N + 1. The length, 
which is in bytes, does not include the Status, Check 
Sum, and Inverted Slave Addr fields. 
Write Request Specifies the length of the data immediately following 
this byte, that is byte 2 through byte N + 1. The length, 
which is in bytes, does not include the Status, Check 
Sum, and Inverted Slave Addr fields. 
Write Response Always specified as 0. 



Data Byte 1 



Specifies the data in a read request and response, and a 
write request 



Data Byte N 
Status 

Check Sum 

Inverted Slave 
Addr 



Specifies whether or not this command executes 
successfully. A non-zero entry indicates a failure. 
Specifies a direction control byte to ensure the integrity 
of a message on the wire. 
Specifies the Slave Addr, which is inverted. 



Referring again to FIG. 2, it is seen that the system 
interface 105 further includes a command and status register 
(CSR) 209 which controls operations and reports on the 
status of commands. The operation and functionality of CSR 
209 is described in further detail below. Both synchronous 
and asynchronous I/O modes are provided by the system 
interface 105. Thus, an interrupt line 211 is coupled between 
the system interface processor 201 and the ISA bus 103 and 
provides the ability to request an interrupt when asynchro- 
nous I/O is complete, or when an event occurs while the 
interrupt is enabled. As shown in FIG. 2, in one embodiment, 
the address of the interrupt line 211 is fixed and indicated as 
IRQ 15 which is an interrupt address number used specifi- 
cally for the ISA bus 103. 

The MDR 207 and the request and response buffers 203 
and 205, respectively, transfer messages between a system 
operator or client and the failure reporting system of the 
invention. The buffers 203 and 205 are configured as first-in 
first-out (FIFO) buffers. That is, in these buffers, the next 
message processed is the one that has been in the queue the 
longest time. The buffers 203 and 205 have two functions: 
(1) they match speeds between the high-speed ISA bus 103 
and the slower system bus 117 (FIG. 1); and (2) they serve 
as interim buffers for the transfer of messages. This relieves 
the system interface processor 201 of having to provide this 
buffer. 

When the MDR 207 is written to by the ISA bus 103, it 
loads a byte into the request buffer 203. When the MDR 207 
is read from the ISA bus 203, it unloads a byte from the 
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response buffer 205. The system interface processor 201 
reads and executes the request from the request buffer 203 
when a message command is received in the CSR 209. A 
response message is written to the response buffer 205 when 
the system interface processor 201 completes executing the 
command. The system operator or client can read and write 
message data to and from the buffers 203 and 205 by 
executing read and write instructions through the MDR 207. 

The CSR 209 has two functions. The first is to issue 
commands, and the second is to report on the status of 
execution of a command. The commands in the system 
interface 105 are usually executed synchronously. That is, 
after issuing a command, the client must continue to poll the 
CSR status to confirm command completion. In addition to 
synchronous I/O mode, the client can also request an asyn- 
chronous I/O mode for each command by setting a "Asyn 
Req" bit in the command. In this mode, an interrupt is 
generated and sent to the ISA bus 103, via the interrupt line 
211, after the command has completed executing. 

The interrupt line 211 may use an ISA IRQ 15 protocol, 
as mentioned above, which is well-known in the art. 
Alternatively, the interrupt line 211 may utilize a level- 
triggered protocol. A level-triggered interrupt request is 
recognized by keeping the signal at the same level, or 
changing the level of a signal, to send an interrupt. In a 
system which utilizes the level-triggered interrupt, it is a 
particular level of a signal, either high or low, which 
represents the interrupt signal. In contrast, an edge-triggered 
interrupt, for example, is recognized by the signal level 
transition. That is an interrupt is detected when the signal 
changes from either a high level to a low level, or vice versa, 
regardless of the resulting signal level. A client can either 
enable or disable the level-triggered interrupt by sending 
"Enable Ints" and "Disable Ints" commands. If the interrupt 
line is enabled, the system interface processor sends an 
interrupt signal to the ISA bus 103, either when an asyn- 
chronous I/O is complete or when an event has been 
detected. 

In the embodiment shown in FIG. 2, the system interface 
105 may be a single-threaded interface. That is, only one 
client, or system operator, is allowed to access the system 
interface 105 at a time. Therefore, a program or application 
must allocate the system interface 105 for its use before 
using it, and then deallocate the interface 105 when its 
operation is complete. The CSR 209 indicates which client 
or operator is allocated access to the system interface 105 at 
a particular time. 

A further discussion of the structure and operation of the 
system interface 105 may be found in a copending and 
commonly owned patent application entitled, I 2 C "I 2 C To 
ISA Bus Interface," which is listed in Appendix A attached 
hereto. 

FIG. 4 illustrates a system block diagram of one embodi- 
ment of the remote interface 119 of FIG. 1. As described 
above, the remote interface 119 serves as an interface which 
handles communications between the server system 100 
(FIG. 1) and an external computer, such as a local computer 
123 or a remote computer 125. The local computer 123 is 
typically connected to the remote interface 119, via a local 
communication line 127 such as an RS232 line, and the 
remote computer 129 is typically connected to the remote 
interface 119 by means of a modem connection line 133 
which connects the remote modem 129 to the server modem 
131. 

As shown within the dashed lines of in FIG. 4, the remote 
interface 119 comprises a remote interface processor 401, a 
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remote interface memory 403, a transceiver 405 and an 
RS232 port 407. The remote interface processor 401 is 
coupled to the system bus 107 and receives an event signal 
from the controller 109 (FIG. 1) when a failure condition has 
been detected. In one embodiment, the remote interface 
processor 401 is a PIC16C65 controller chip which includes 
an event memory (not shown) organized as a bit vector, 
having at least sixteen bits. Each bit in the bit vector 
represents a particular type of event. Writing an event to the 
remote interface processor 401 sets a bit in the bit vector that 
represents the event. The remote interface memory 403 is 
coupled to the remote interface processor 401 for receiving 
and storing event data, commands, and other types of data 
transmitted to the remote interface 119. In one embodiment, 
the remote interface memory 403 is a static random access 
memory (SRAM). 

In order to communicate with external devices, the remote 
interface 119 further includes the transceiver 405, coupled to 
the remote interface processor 401, for receiving and trans- 
mitting data between the remote interface processor 401 and 
a local PC 123 or a remote/client PC 125, in accordance with 
a specified communication protocol. One embodiment of 
such a communication protocol is described in further detail 
below. In one embodiment, the transceiver 405 is an 
LT1133A signal processing chip. Coupled to the transceiver 
405 is a RS232 communication port which is well-known in 
the art for providing data communications between com- 
puter systems in a computer network. One of the functions 
of the transceiver 405 is to transpose signal levels from the 
remote interface processor 401 to RS232 signal protocol 
levels. 

The remote interface 119 is coupled to a switch 121 for 
switching access to the remote interface 119 between a local 
computer 123 and a remote PC 125. The switch 121 receives 
command signals from the remote interface processor 401 
and establishes connectivity to the RS232 communication 
port 407 based on these command signals. Upon receiving 
an event signal, the remote interface processor 401 will set 
the connectivity of the switch 121 based on criteria such as 
the type of event that has been detected. If the switch 121 is 
set to provide communications between the local PC 123 
and the remote interface 119, after receiving an event signal, 
the remote interface processor 401 transmits a Ready To 
Receive (RTR) signal to the local computer 123. A software 
program which is stored and running in the local computer 
123 recognizes the RTR signal and sends back appropriate 
commands in order to interrogate the remote interface 
processor 401. In one embodiment, the software program 
which is stored and executed by the local computer 123 is 
the Maestro Recovery Manager software program, manu- 
factured by Netframe, Inc. Upon interrogating the remote 
interface processor 401, the local computer 123 detects that 
an event signal has been received by the remote interface 
119. The local computer 123 may then read the bit vector in 
the remote interface processor 401 to ascertain the type of 
event that occurred and thereafter notify a local user of the 
event by displaying an event message on a monitor coupled 
to the local computer 123. After the local user has been 
notified of the event, as described above, he or she may then 
obtain further information about the system failure which 
generated the event signal by accessing the system log 117 
(FIG. 1) from the local computer 123 via the remote 
interface 119. This capability is also provided by the Mae- 
stro Recovery Manager software program. 

If the switch 121 is set to provide connectivity to the 
remote/client computer 125 via a modem-to-modem 
connection, a server modem 131 will dial the modem 
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number (telephone number) corresponding to the client 
modem 129 in order to establish a communication link with 
the remote computer 125. In one embodiment, the number of 
the client modem 129 is stored in the system log 117 (FIG. 
1) and accessed by the remote interface processor 401 upon 
receiving specified event signals. When the client modem 
129 receives "a call" from the server modem 131, the remote 
computer 125 will send back appropriate commands and/or 
data in order to interrogate the remote interface processor 
401 in accordance with a software program running on the 
remote computer 125. In one embodiment, this software 
program is the Maestro Recovery Manager software pro- 
gram manufactured by Netframe, Inc. Upon interrogating 
the processor 401, the remote computer 125 will detect that 
an event signal has been transmitted to the remote interface 
119. The remote computer 125 may then read the bit vector 
in the remote interface processor 401 to ascertain the type of 
event that occurred and thereafter notify a remote user of the 
event by displaying an event message on a monitor coupled 
to the remote computer 125. At this point, a remote user, 
typically a client authorized to have access to the server 
system 100, may obtain further information about the failure 
condition which generated the event signal by accessing the 
system log 117 (FIG. 1) from the remote computer 125 via 
the remote interface 119. 

In one embodiment, the remote interface communication 
protocol is a serial protocol that communicates messages 
across a point-to-point serial link. This link is between the 
remote interface processor 401 and a local or remote client. 
The protocol encapsulates messages in a transmission packet 
to provide error-free communication and link security and 
further uses the concept of "byte stuffing" in which certain 
byte values in a data stream always have a particular 
meaning. Examples of bytes that have a special meaning in 
this protocol are: 

SOM: Start of a message 
EOM: End of a message 

SUB: The next byte in the data stream must be substituted 
before processing. 

INT: Event Interrupt 

Data: An entire Message 

The remote interface serial protocol uses two types of 
messages: (1) requests, which are sent by remote manage- 
ment systems (PCs) to the Remote Interface; and (2) 
responses, which are returned to the requester by the Remote 
Interface. The formats of these messages are illustrated in 
FIGS. 5A-5C 

The following is a summary of the fields within each of 
the messages shown in FIGS. 5A-5C: 



SOM A special data byte value marking the start of a message. 

EOM A special data byte value marking the end of a message. 

Seq. # A one-byte sequence number, which is incremented on each 

request It is stored in the response. 
TYPE One of the following types of requests: 

IDENTIFY Requests the remote interface to send back identification 

information about the system to which it is connected. 

It also resets the next expected sequence number. 

Security authorization does not need to be established 

before the request is issued. 
SECURE Establishes secure authorization on the serial link by 

checking password security data provided in the message 

with the server system password. 
UNSECURE Gears security authorization on the link and attempts to 

disconnect it This requires security authorization to 

have been previously established. 
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MESSAGE 



POLL 



STATUS 
OK 

OK_JE- 
VENT 

SEQUENCE 



Passes the data portions of the message to the remote 
interface for execution. The response from remote 
interface is sent back in the data portion of the response. 
This requires security authorization to have been 
previously established. 

Queries the status of the remote interface. This request 
is generally used to determine if an event is pending in 
the remote interface. 

One of the following response status values: 
Everything relating to communication with the remote 
interface is successful. 

Everything relating to communication with the remote 
interface is successful. In addition, there is one or more 
events pending in the remote interface. 
The sequence number of the request is neither the 
current sequence number or retransmission request, nor 
the next expected sequence number or new request. 
Sequence numbers may be reset by an IDENTIFY 
request. 

The check byte in the request message is received 
incorrectly. 

Something about the format of the message is incorrect. 
Most likely, the type field contains an invalid value. 
The message requires that security authorization be in 
effect. Or, if the message has a TYPE value of 
SECURE, the security check failed. 
Indicates a message integrity check byte. Currently the 
value is 256 minus the previous bytes in the message. For 
example, adding all bytes in the message up, to and 
including the check byte should produce a result of zero (0). 
A special one-byte message sent by the Remote Interface 
when it detects the transition from no events pending to one 
or more events pending. This message can be used to 
trigger reading events from the remote interface. Events 
should be read until the return status changes form 
OK_EVENT to OK. 



In one embodiment, the call-out protocol of the remote 
interface is controlled by a software code called Callout 
Script. The Callout script controls actions taken by the 
remote interface 119 when it is requested to make a callout 
to a local or remote computer, 123 or 125, respectively. Hie 
script is a compact representation of a simple scripting 
language that controls the interaction between a modem and 
a remote system. Because the script keyword fields are 
bytes, it requires a simple compiler to translate from text to 
the script. The script is stored in the system recorder 115 
(FIG. 1) and is retrieved by the remote interface 119 when 
needed. The following is a summary of some of the fields of 
the callout script: 



CHECK 

FORMAT 

SECURE 

Check 



INT 



Field Data 



Function 



Label 
Goto 
Speed 

Send 
Test 

Trap 

Search 



Control 
Wait 

Exit 



Label Value 
Label Value 
Speed Value 

Data String 

Condition, 

label 

Event, label 

Data string, 
label value 



Control 
.1-25.5 sec. 

OK, Fail 



Establishes a label in the script 

Transfers control to a label. 

Sets the remote interface speed to the specified 

value. 

Sends the data string to the serial interface. 
Testes the specified condition and transfer to 
label if the tests is true. 

Establishes or removes a trap handler address for 
a given event. 

Searches for a specific data string of the 

receiving buffer. If the data string is found, 

remove the data up to and including this string, 

form the buffer, Then, transfer to label. 

Takes the specified control action. 

Delays execution of the script for the specified 

time. 

Terminates script processing and exit with a 
status and log result 



A further description of the remote interface 119 can be 
found in a copending and commonly owned U.S. patent 



application entitled, "System Architecture For Remote 
Access And Control of Environmental Management," which 
is listed in Appendix A attached hereto. 
Referring to FIG. 6, a block diagram of one embodiment 
5 of the system recorder 115 of FIG. 1 is illustrated. The 
system recorder 115 is enclosed by the dashed lines and 
includes a system recorder processor 601 and a real-time 
clock chip 603. In one embodiment, the system recorder 
processor is a PIC chip, part no. PIC16C65, manufactured 

10 by Microchip Technologies, Inc., and the real-time clock 
chip 603 is a Dallas 1603 IC Chip, manufactured by Dallas 
Semiconductor, Inc. of Dallas, Tex., and which includes a 
four-byte counter which is incremented every second. Since 
there are 32 bits, the real-time clock chip 603 has the 

15 capacity of recording the time for more man 100 years 
without having to be reset. It also has battery backup power, 
so if the power goes off, it continues to "tick." The real-time 
clock chip 603 records "absolute" time. In other words, it 
does not record time in terms of the time of day in a 

20 particular time zone, nor does it reset when the time in the 
real world is reset forward or back one hour for daylight 
savings. The operating system must get a reference point for 
its time by reading the real-time clock chip 603 and then 
synchronizing it with real world time. 

25 The system recorder processor 601 is coupled to the 
system bus 117. When a failure condition is detected by the 
controller 109 (FIG. 1), the controller 109 transmits failure 
information related to the detected failure condition to the 
system recorder processor 601. This failure information may 

30 include the values of out-of-tolerance operational param- 
eters such as fan speed or a system temperature, for example. 
Upon receiving this failure information, the system recorder 
processor 601 queries the real-time clock chip 603 for a time 
value which is stored in the 8-byte field within the chip 603. 

35 The real-time clock chip 603 transmits the value of this 
8-byte field to the processor 601 whereupon the processor 
601 "stamps" the failure information with this time value. 
The time value is included as part of the failure information 
which is subsequently stored in the system log 117. 

40 In order to store data into the system log 117, the system 
recorder processor 601 must obtain the address of the next 
available memory space within the system log 117 and set a 
pointer to that address. The system recorder processor 601 is 
coupled to the system log 117 by means of an address bus 

45 606 and a data bus 607. Prior to storing or retrieving data 
from the system log, the processor 601 communicates with 
the system log 117 in order to ascertain the addresses of 
relevant memory locations in or from which data is to be 
either stored or retrieved. Upon receiving an address, the 

50 processor 601 can proceed to store or retrieve data from the 
corresponding memory space, via the data bus 607. FIGS. 
7A-7D illustrate a flowchart of one embodiment of a 
process of reading data from and writing data to the system 
log. 

55 Referring now to FIGS. 7A-7D, a flow chart illustrates 
one embodiment of a method by which the system recorder 
115 (FIG. 1) stores and retrieves information from the 
system log 117. In the embodiment discussed below the 
system log 117 is a non-volatile random access memory 

60 (NVRAM) and is referred to as NVRAM 117. In FIG. 7A, 
at step 700, the system recorder 115 is typically in an idle 
state, i,e., waiting for commands from other microcontrol- 
lers in the network. At step 702, the system recorder 115 
determines if an interrupt command is detected from other 

65 microcontrollers. If no interrupt command is detected, then 
at step 704, the system recorder 115 checks if a reset 
command is pending. A reset command is a request to clear 
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the all memory cells in the NVRAM 117. If a reset command 
is detected, then at step 706, the system recorder 115 clears 
all memory cells in the NVRAM 115 and returns to its idle 
state at step 700, and the entire process repeats itself. If a 
reset command is not detected, then at step 708, the system 5 
recorder 115 updates the time stored in the real-time clock 
chip 603 (FIG. 6) every one second. At this step, the system 
recorder 115 reads the real time clock and saves the real time 
in a local register (not shown). 

If, at step 702, an interrupt command is detected from 10 
other microcontrollers, the system recorder 115 determines 
the type of data in the interrupt command at step 710. For the 
purpose of logging message events in the NVRAM 117, the 
log data and event data type are pertinent. As noted above, 
the log data type is used to write a byte string to a circular 15 
log buffer, such as the NVRAM 117. The log data type 
records system events in the NVRAM 117. The maximum 
number of bytes that can be written in a log entry is 249 
bytes. The system recorder 115 adds a total of six bytes at the 
beginning of the interrupt command: a two-byte identifica- 20 
tion code (ID), and a four-byte timestamp for recording the 
real time of the occurrence of the system event. 

With special firmware, the NVRAM 117 is divided into 
two blocks: a first block having 64 kbytes of memory space, 
and a second block having 64 kbytes of memory space. The 25 
first block of the NVRAM 117 is a fixed-variable memory 
block which stores ED codes of the devices installed in the 
network as well as other information. The second block is a 
memory block which stores message codes in connection 
with events occurring in the network. The NVRAM 117 may 30 
be based upon devices manufactured by Dallas Semicon- 
ductor Corporation, e.g., the DS1245Y/AB 1024K Nonvola- 
tile SRAM. 

Based on the interpretation of the data type at step 712, the 
system recorder 115 determines whether the interrupt com- 35 
mand is intended to be sent to the first block or second block 
of the NVRAM 117. If the interrupt command is intended to 
be sent to the first block of NVRAM 117, then the process 
described in FIG. 7B is followed. If the interrupt command 
is not intended to be sent to the first block of NVRAM 117, 40 
then it is intended to be sent to the second block of NVRAM 
117. At step 714, the system recorder 115 determines 
whether the interrupt command is a read or write command 
for the second block. If the interrupt command is a read 
command, then the process described in FIG. 7C is fol- 45 
lowed. If the interrupt command is not a read command, then 
it is a write command and the process described in FIG. 7D 
is followed. 

Referring to FIG. 7B, a flow chart is provided for describ- 
ing the steps of performing a read from and/or write to the 50 
first block of the NVRAM 117. As noted above, the first 
block of the NVRAM 117 is a 64-kbyte memory block. The 
first block is a fixed- variable memory block which stores ID 
codes of the devices installed in the network. Hence, a 
command addressed to the first block is typically generated 55 
by a controller (e.g., chassis controller 112 of FIG. 1) 
responsible for updating the presence or absence of devices 
in the network. The process described in FIG. 7B is followed 
when, at step 712 (shown in FIG. 7A), the system recorder 
115 determines that the command interrupt is intended to be 60 
sent to the first block of the NVRAM 117. 

As shown in FIG. 7B, at step 718, the system recorder 115 
determines whether the interrupt command is to read from or 
write to the NVRAM 117. If the command interrupt is a read 
command, then at step 720, the system recorder 115 loads 65 
the address pointer at the intended address location in 
NVRAM 117. At step 722, the system recorder 115 reads the 



intended message from the address location in the NVRAM 
117, and forwards the read data to the master device (i.e., 
device requesting the read operation) in the network. After 
the read operation is complete, at step 728, the system 
recorder 115 issues an interrupt return command to return to 
its idle state at step 700 (shown in FIG. 7A). 

If at step 718 the system recorder 115 determines that the 
interrupt command is a write command, then at step 724, the 
system recorder 115 loads the address pointer at the intended 
address location in NVRAM 117. The system recorder 115 
preferably checks on the availability of memory space in 
NVRAM 117 prior to executing a write operation (see FIG. 
7D for details). At step 726, the system recorder 115 writes 
the event message to the address location in the NVRAM 
117, and forwards a confirmation to the master device in the 
network. After the write operation is complete, at step 728, 
the system recorder 115 issues an interrupt return command 
to return to its idle state at step 700 (shown in FIG. 7A). 

Referring now to FIG. 7C, a flow chart is provided for 
describing the steps of r^rforrning a read operation from the 
second block of the NVRAM 117. As noted above, the 
second block of the NVRAM 117 is a 64-kbyte memory 
block. The second block is a memory block which stores 
event messages in connection with events occurring in the 
network. Hence, a command addressed to the second block, 
is typically generated by a controller responsible for updat- 
ing the occurrence of such events. The process described in 
FIG. 7C is followed when, at step 714 (shown in FIG. 7A), 
the system recorder 115 detennines that the interrupt com- 
mand is a read command intended to the second block of the 
NVRAM 117. 

As shown in FIG. 7C, if the system recorder 115 deter- 
mines that the interrupt command is a read operation, then 
at step 730, the system recorder 115 loads an address pointer 
to the intended address in the second block of NVRAM 117. 
At step 732, the system recorder 115 performs a read 
operation of the first logged message from the NVRAM 117 
commencing with the intended address location. For a read- 
operation, it is preferable that only the 165534 (FFFEh) and 
65533 (FFFDh) addresses be recognized. The address 65534 
specifies the address of the oldest valid message. The 
address 65533 specifies the address of the next message 
following the last message read from the log in NVRAM 
117. The last address in the second block of the NVRAM 
117 is 65279 (FEFFh). This is also the address at which the 
system recorder 115 performs a pointer wrap operation (see 
FIG. 7D for details). In doing so, the system recorder 115 
redirects the address pointer to the beginning of the second 
block of the NVRAM 117. Hence, the address of the next 
message address after the 65279 address is 0. To perform a 
read operation of the entire second block in a chronological 
order, the timestamp is read first. Then, the message logged 
at address 65534 is read second. This message constitutes 
the first logged message. Then, the message logged at 
address 65533 is read next. This message is the next logged 
message. Then, the message logged at address 65533 is read 
again to read all subsequently logged messages. The reading 
at address 65533 terminates until the status field returns a 
non-zero value such as 07H, for example. 

At step 734, the system recorder 115 determines whether 
the address location has reached the end of the second block 
in the NVRAM 117. If the address location has not reached 
the end of the second block, then at step 736, the system 
recorder 115 performs a read operation of the next logged 
message using the addressing scheme described above. The 
system recorder 115 transmits all read messages to the 
master device via the I 2 C bus. If the address location has 



US 6,170 

17 

reached the end of the second block, then the system 
recorder 115 returns to its idle state 700 (shown in FIG. 7C). 

Referring now to FIG. 7D, a flow chart is provided for 
describing the steps of performing a write operation to the 
second block of die NVRAM 117. Typically, a command 5 
addressed to the second block is generated by a controller 
(e.g., chassis controller 222) responsible for updating the 
occurrence of such events. The process described in FIG. 7D 
is followed when, at step 714 (shown in FIG. 7A), the 
system recorder 115 determines that the interrupt command 
is a write command directed to the second block of the 
NVRAM 117. 

As shown in FIG. 7D, if the system recorder 115 deter- 
mines that the interrupt command is a write command, then 
at step 740, the system recorder 115 loads an address pointer 
to the intended address in the second block of NVRAM 117. 15 
At step 742, the system recorder 115 detennines whether a 
memory space is available in the second block of NVRAM 
117 to perform the requested write operation. If a memory 
space is not available in the second block, then at step 744, 
the system recorder 1 15 performs a pointer wrap operation. 20 
In doing so, the system recorder 115 redirects the address 
pointer to the beginning of the second block of the NVRAM 
117. The system recorder 115 erases the memory space 
corresponding to a single previously logged message which 
occupies that memory space. Additional previously logged 25 
messages are erased only if more memory space is required 
to perform the present write operation. 

If the system recorder 115 determines that a memory 
space is available in the second block of the NVRAM 117, 
then at step 746, the system recorder 115 fetches the time 30 
from the real-time clock 603 and stamps (i.e., appends) the 
real time to the message being written. As noted above, the 
real time comprises a four-byte field (i.e., 32 bits) which are 
appended to the message being written. At step 748, the 
system recorder 115 writes the time-stamped message to the 35 
second block of the NVRAM 117. At step 750, the system 
recorder 115 issues an interrupt return command to return to 
its idle state 700 (shown in FIG. 7A). 

A further description of the system recorder 115 and the 
NVRAM 117 can be found in a copending and commonly 40 
owned U.S. patent application entitled, "Black Box 
Recorder For Information System Events," which is listed in 
Appendix A attached hereto. 

FIGS. 8A-8D illustrate a flowchart of one embodiment of 
the process of reporting system failures in accordance with 45 
the invention. As the process is described below reference is 
also made to FIG. 1 which illustrates a block diagram of one 
embodiment of the server system 100 which carries out the 
process shown in FIGS. 8A-8D. 

Referring to FIG. 8A, the process starts at location 800 50 
and proceeds to step 801 wherein a controller 109 monitors 
the server 100 for system failures. In step 803, a determi- 
nation is made as to whether any system failures have been 
detected. If in step 803, no failures have been detected, the 
process moves back to step 801 and the controller 109 55 
continues to monitor for system failures. If in step 803 a 
failure is detected, the process moves to step 805 in which 
the failure information is sent to the system recorder 115. In 
this step, the controller 109 sends failure information, such 
as the value of measured operation parameters which have 60 
been determined to be out of tolerance, to the system 
recorder 115 which assigns a time stamp to the failure event 
Next, in step 807, the system recorder 115 logs the failure by 
storing the failure information, along with its time stamp, in 
the system log 117. In step 809, an event signal is sent to the 65 
system interface 105 and to the remote interface 119. The 
process then moves to step 811 as shown in FIG. 8B. 
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Referring to FIG. 8B, in step 811, an interrupt signal is 
sent to the CPU 101 of the server system. Or, alternatively, 
the CPU 101 may be periodically monitoring the system 
interface 105 in which case the CPU 101 will detect that an 
event signal has been received by the system interface 105. 
In step 813, the CPU 101 reads the event from the system 
interface 105. Thereafter, in step 815, the CPU 101 notifies 
a system operator or administrator of the event who may 
then take appropriate measures to correct the failure condi- 
tion. In one embodiment, the CPU 101 may notify a system 
operator by displaying an error or event message on a 
monitor coupled to the CPU 101, or the CPU 101 may 
simply illuminate a light emitting diode (LED) which indi- 
cates that a system failure has been detected. At this point, 
the system operator may decide to ignore the event message 
or obtain more information about the event by accessing the 
system log 117 for the failure information which was stored 
in it in step 807. By means of operating system software 
executed by the CPU 101 and the communications protocol 
established by the system interface 105, the system operator 
can access this failure information from the system log 117. 
Additionally, the CPU 101 may take remedial actions on its 
own initiative (prograrriming). For example, if a critical 
system failure has been detected, e.g., a system temperature 
is above a critical threshold, the CPU 101 may back-up all 
currendy rurining files (core dump into back-up memory 
space) and then shut down the server system. 

In step 817, the CPU 101 decides whether to call out to 
a local or remote computer in order to notify it of the event. 
Particular types of events may warrant a call-out to either a 
local or remote computer in order to notify important 
personnel or administrators of a particular problem, while 
other types of events may not. If in step 817 it is determined 
that the particular event does not warrant a call-out to a local 
or remote computer, the process ends at step 819. On the 
other hand, if the CPU 101 decides that a call-out is 
warranted, the process moves to step 821 as shown in FIG. 
8C. 

Referring to FIG. 8C, in step 821, the CPU 101 will 
determine whether the call-out is to be made to a local 
computer 123, connected to the server system 100 via a local 
communication line 127 such as a an RS232 line, or to a 
remote computer 125, connected to the server system 100 
via a modem-to-modem connection. If in step 821 it is 
determined that a call-out to a local computer 123 is to be 
made, the function of step 823 is implemented wherein the 
operating system sets the call-out switch 121 to the local 
connection mode. In step 825, the remote interface 119 
notifies the local computer 123 that an event signal has been 
received. Thereafter, in step 827, the local computer reads 
the event message from the remote interface 119. Upon 
reading the event message, in step 829, the local computer 
123 may notify a local user of the event condition and/or 
take other appropriate measures. Depending on the software 
program running on the operating system of the local 
computer, the local computer 123 may notify the local user 
by displaying an error or event message on a monitor of the 
local computer 123, or the local computer 123 may simply 
illuminate a light emitting diode (LED) which indicates that 
a system failure has been detected. At this point, the local 
user may decide to ignore the event message or obtain more 
information about the event by accessing the system log for 
the failure information which was stored in it in step 807. 
The local user may then contact appropriate personnel 
located at the site where the server is located and inform 
and/or instruct such personnel to remedy the problem. Or, 
the local user may travel to the site himself, or herself, in 
order to fix the problem. The process then ends at step 819. 
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If in step 821 it is determined that a call-out is to be made 
to a remote computer, the process proceeds to step 831 
wherein the call-out switch 121 is set to a remote connection 
mode. The process then moves to step 833 as shown in FIG. 
8D. In step 833, the CPU 101 of the server system deter- 5 
mines whether the remote computer 125 has security autho- 
rization to receive the event information and access the 
system log. This function may be accomplished by receiving 
a password from the remote computer or receiving an 
encrypted identification signal from the remote computer 10 
and verifying that it matches the server's password or 
identification signal. However, other methods of providing 
secure transmissions between a host system and a remote 
system which are known in the art may be utilized in 
accordance with the invention. If in step 833, security 15 
authorization has not been established the process ends at 
step 819. However, if in step 833, security authorization is 
established, the process proceeds to step 835, wherein the 
remote interface 119 dials out through the modem-to- 
modem connection to establish a communication link with 20 
the remote computer 125. The dial out number is automati- 
cally provided to the remote interface 119 by the CPU 101 
and in one embodiment a list of dial-out numbers may be 
stored in the system log 117. 

In step 837, the remote interface 119 checks whether a 25 
good communication link has been established by determin- 
ing whether a data set read (DSR) and data carrier detect 
(DCD) signals have been communicated between a server 
modem 131 and a remote modem 129. The DSR and DCB 
signals are common signals used in modem-to-modem hand- 30 
shake protocols. However, any protocol for verifying an 
active modem-to-modem communication link which is 
known in the art may be utilized in accordance with the 
invention. If in step 837, it is determined that a good 
communication link cannot be established, the process pro- 35 
ceeds to step 839 wherein the CPU 101 reports that the 
call-out failed. The process then ends in step 819. 

If in step 837, it is determined that a good communication 
link has been established, the remote interface 119, in step 
841, notifies the remote computer 125 that an event signal 40 
has been received. In step 843, the remote computer reads 
the event from the remote interface 119 by reading a bit 
vector within the remote interface 119. In step 845, after 
reading the event in step 843, the remote computer 125 
notifies a remote user of the event condition and/or take 45 
other appropriate measures. Depending on the software 
program ninning on the operating system of the remote 
computer 125, the remote computer 125 may notify a remote 
user by displaying an error or event message on a monitor 
of the remote computer 125, or the remote computer 125 50 
may simply illuminate a light emitting diode (LED) which 
indicates that a system failure has been detected. At this 
point, the remote user may decide to ignore the event 
message or obtain more information about the event by 
accessing the system log for the failure information which 55 
was stored in it in step 807. The process then ends at step 
819. 

As described above, the invention provides a fast and 
efficient method of detecting system failures and/or events 
and reporting such failures and events to a client, system 60 
operator, or control center of a server system. By logging 
failure information into a system log, a system operator or 
client can ascertain the nature of a particular problem and 
thereafter make an informed decision as to what steps may 
be required to correct the system error or failure. By 65 
providing this type of failure reporting system, the invention 
alleviates much confusion and frustration on the part of 



system users which would otherwise result. Additionally, by 
quickly reporting such failures, the amount of downtime of 
the server system is reduced. 

The invention may be embodied in other specific forms 
without departing from its spirit or essential characteristics. 
The described embodiments are to be considered in all 
respects only as illustrative and not restrictive. The scope of 
the invention is, therefore, indicated by the appended claims, 
rather than by the foregoing description. All changes which 
come within the meaning and range of equivalency of the 
claims are to be embraced within their scope. 

Appendix A 

Incorporation by Reference of Commonly Owned 

Applications 

The following patent applications, commonly owned and 
filed on the same day as the present application are hereby 
incorporated herein in their entirety by reference thereto: 
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"System Architecture for Remote 
Access and Control of Environmental 
Management*' 

t4 Method of Remote Access and 
Control of Environmental 
Management" 

"System for Independent Powering of 
Diagnostic Processes on a Computer 
System" 

"Method of Independent Powering of 
Diagnostic Processes on a Computer 
System" 

"Diagnostic and Managing Distributed 
Processor System" 
"Method for Managing a Distributed 
Processor System" 
"System for Mapping Environmental 
Resources to Memory for Program 
Access" 

"Method for Mapping Environmental 
Resources to Memory for Program 
Access" 

"Hot Add of Devices Software 
Architecture" 

"Method for The Hot Add of Devices" 
"Hot Swap of Devices Software 
Architecture" 

"Method for The Hot Swap of 
Devices" 

"Method for the Hot Add of a Network 
Adapter on a System Including a 
Dynamically Loaded Adapter Driver" 
''Method for the Hot Add of a Mass 
Storage Adapter on a System Including 
a Statically Loaded Adapter Driver" 
"Method for the Hot Add of a Network 
Adapter on a System Including a 
Statically Loaded Adapter Driver" 
"Method for the Hot Add of a Mass 
Storage Adapter on a System Including 
a Dynamically Loaded Adapter Driver" 
"Method for the Hot Swap of a 
Network Adapter on a System 
Including a Dynamically Loaded 
Adapter Driver" 

"Method for the Hot Swap of a Mass 
Storage Adapter on a System Including 
a Statically Loaded Adapter Driver" 
"Method for the Hot Swap of a 
Network Adapter on a System 
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Including a Statically Loaded Adapter 
Driver" 

"Method for the Hot Swap of a Mass 
Storage Adapter on a System Including 
a Dynamically Loaded Adapter Driver" 
"Method of Performing an Extensive 
Diagnostic Test in Conjunction with a 
BIOS Test Routine" 
"Apparatus for Performing an 
Extensive Diagnostic Test in 
Conjunction with a BIOS Test 
Routine" 

"Configuration Management Method 
for Hot Adding and Hot Replacing 
Devices" 

"Configuration Management System 
for Hot Adding and Hot Replacing 
Devices" 

"Apparatus for Interfacing Buses" 
"Method for Interfacing Buses" 
"Computer Fan Speed Control Device" 
"Computer Fan Speed Control Method" 
"System for Powering Up and 
Powering Down a Server" 
"Method of Powering Up and 
Powering Down a Server" 
"System for Resetting a Server" 
"Method of Resetting a Server" 
"System for Displaying Flight 
Recorder" 

"Method of Displaying Flight 
Recorder" 

"Synchronous Communication 
Interface" 

"Synchronous Communication 
Emulation" 

"Software System Facilitating the 
Replacement or Insertion of Devices in 
a Computer System" 
"Method for Facilitating the 
Replacement or Insertion of Devices in 
a Computer System" 
"System Management Graphical User 
Interface" 

"Display of System Information" 
"Data Management System Supporting 
Hot Plug Operations on a Computer" 
"Data Management Method Supporting 
Hot Plug Operations on a Computer" 
"Alert Configurator and Manager" 
"Managing Computer System Alerts" 
"Computer Fan Speed Control System" 
"Computer Fan Speed Control System 
Method" 

"Black Box Recorder for Information 
System Events" 

"Method of Recording Information 
System Events" 

"Method for Automatically Reporting a 
System Failure in a Server" 
"System for Automatically Reporting a 
System Failure in a Server" 
"Expansion of PCI Bus Loading 
Capacity" 

"Method for Expanding PCI Bus 
Loading Capacity" 

"System for Displaying System Status" 
"Method of Displaying System Status" 
"Fault Tolerant Computer System" 
"Method for Hot Swapping of Network 
Components" 

"A Method for Communicating a 
Software Generated Pulse Waveform 
Between Two Servers in a Network" 
"A System for Communicating a 
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Software Generated Pulse Waveform 
Between Two Servers in a Network" 
"Method for Clustering Software 
Applications" 
10 "System for Clustering Software 
Applications" 

"Method for Automatically 
Configuring a Server after Hot Add of 
a Device" 

"System for Automatically Configuring 
15 a Server after Hot Add of a Device" 

"Method of Automatically Configuring 

and Formatting a Computer System 

and Installing Software" 

"System for Automatically Configuring 

and Formatting a Computer System 
2Q and Installing Software" 

"Determining Slot Numbers in a 

Computer" 

"System for Detecting Errors in a 
Network" 

"Method of Detecting Errors in a 
Network" 

"System for Detecting Network Errors" 
"Method of Detecting Network Errors" 
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What is claimed is: 

1. A system for reporting a failure condition in a server 
system, comprising: 

a controller which monitors the server system for system 
failures, and generates an event signal and failure 
information if a system failure is detected; 

a system interface, coupled to the controller, which 
receives the event signal and failure information; 

a central processing unit, coupled to the system interface, 
wherein, upon receiving the event signal, the system 
interface reports an occurrence of an event to the 
central processing unit; and 

a system log which receives failure information commu- 
nicated from the system interface and stores said failure 
information. 

2. The system of claim 1 wherein the system log is a 
45 nonvolatile random access memory. 

3. The system of claim 1 wherein the system interface 
comprises a bit vector, having a plurality of bits, which 
receives the event signal and stores a value corresponding to 
the event signal, wherein the event signal changes the value 
of at least one bit of the bit vector. 

4. The system of claim 1 further comprising a system 
recorder, coupled between the controller and the system log, 
for receiving the failure information from the controller, 
assigning a time value to the failure information, and sub- 
sequently storing the failure information with the time value 
into the system log. 

5. The system of claim 1 wherein the central processing 
unit executes a software program which allows a system 
operator to access the system log to read the failure infor- 
mation. 

6. The system of claim 5 further comprising a monitor 
coupled to the central processing unit for displaying a 
message to the system operator. 

7. The system of claim 1 further comprising a remote 
interface, coupled to the controller, for receiving the event 
signal and reporting an occurrence of an event to a computer 
external to the server system. 
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8. The system of claim 7 wherein the remote interface 
comprises a bit vector, having a plurality of bits, which 
receives the event signal and stores a value corresponding to 
the event signal, wherein the event signal changes the value 
of at least one bit of the bit vector. 

9. The system of claim 7 wherein the computer stores and 
executes a software program which allows a user of the 
computer to access the system log to read the failure 
information. 

10. The system of claim 7 further comprising a switch, 
coupled to the remote interface, for switching connectivity 
to the remote interface between a first computer and a 
second computer. 

11. The system of claim 10 wherein the first computer is 
a local computer, coupled to the switch via a local commu- 
nications line, and the second computer is a remote 
computer, coupled to the switch via a modem-to-modem 
connection. 

12. A failure reporting system for a server system, com- 
prising: 

a controller which monitors the server system for system 
failures and generates an event signal and failure infor- 
mation if a system failure is detected; 

a system recorder, coupled to the controller, which 
receives failure information and assigns a time value to 
the failure information; 

a system log which stores failure information received 
from the system recorder; and 

a system interface, coupled to the controller, which 
receives and stores the event signal, and reports an 
occurrence of an event to a central processing unit 
which is coupled to the system interface, wherein the 
central processing unit executes a software program 
which allows a system operator to access the system 
log to read failure information stored therein. 

13. The system of claim 12 wherein the system log is a 
nonvolatile random access memory. 

14. The system of claim 12 wherein the system interface 
comprises a bit vector which receives the event signal and 
stores a value corresponding to the event signal, wherein the 
event signal changes the value of at least one bit of the bit 
vector. 

15. The system of claim 12 further comprising a remote 
interface, coupled to the controller, which receives the event 
signal and reports the occurrence of an event to a computer 45 
external to the server system. 

16. The system of claim 15 wherein the remote interface 
comprises a bit vector which receives the event signal and 
stores a value corresponding to the event signal, wherein the 
event signal sets at least one bit of the bit vector to indicate 
that a system failure has occurred. 

17. The system of claim 15 further comprising a switch, 
coupled to the remote interface, which switches connectivity 
to die remote interface between a first computer and a 
second computer. 

18. The system of claim 17 wherein the first computer is 
a local computer, coupled to the switch via a local commu- 
nications line, and the second computer is a remote 
computer, coupled to the switch via a modem connection. 

19. A failure reporting system for a server system, com- 
prising: 

a controller which monitors the server system for system 
failures and generates an event signal and failure infor- 
mation if a system failure is detected; 

a system recorder, coupled to the controller, which 
receives the failure information and assigns a date and 
time to the failure information; 



10 



35 



40 



50 



55 



60 



65 



a system log which stores the failure information; 

a system interface, coupled to the controller, which 
receives and stores the event signal and reports an 
occurrence of an event to a central processing unit, 
coupled to the system interface, wherein the central 
processing unit executes a software program which 
allows a system operator to access the system log to 
read failure information stored therein; 

a remote interface, coupled to the controller, which 
receives the event signal and reports the occurrence of 
an event to a computer external to the server system; 
and 

a switch, coupled to the remote interface, which switches 
connectivity to the remote interface between a first 
computer and a second computer, wherein the first 
computer is a local computer, coupled to the switch via 
a local communications line, and the second computer 
is a remote computer, coupled to the switch via a 
modem connection. 

20. A failure reporting system in a server system, com- 
prising: 

means for detecting a system failure condition; 

means for transmitting failure information related to the 

failure condition to a system recorder; 
means for storing the failure information; and 
means for reporting an occurrence of an event to a central 

processing unit of the server system. 

21. The system of claim 20 further comprising means for 
notifying a human operator of the system failure. 

22. The system of claim 21 wherein the means for 
notifying a human operator comprises means for displaying 
a message on a monitor coupled to the central processing 
unit. 

23. The system of claim 21 further comprising means for 
accessing the system log to read the failure information from 
the system log. 

24. The method of claim 20 further comprising means for 
determining a time when the failure condition occurred and 
means for storing the time with the failure information. 

25. The system of claim 20 wherein the means for 
reporting the occurrence of the event to the central process- 
ing unit comprises: 

means for sending an event signal to a system interface, 
coupled to the central processing unit; 

means for setting a bit in a bit vector within the system 
interface, wherein the setting of the bit corresponds to 
a specified type of system failure; and 

means for sending an interrupt signal to the central 
processing unit after the bit is set, wherein, upon 
receiving the interrupt signal the central processing unit 
reads a status register within the system interface to 
ascertain that the event signal has been received by the 
system interface. 

26. The system of claim 25 further comprising means for 
reading the bit vector to ascertain the type of system failure. 

27. The method of claim 20 wherein the means for 
reporting the occurrence of the event to the central process- 
ing unit comprises: 

means for sending an event signal to a system interface, 

coupled to the central processing unit; 
means for setting a bit in a bit vector within the system 

interface, wherein the setting of the bit corresponds to 

a specified type of system failure; 
and 

means for setting a status of a status register within the 
system interface to indicate the occurrence of the event, 
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wherein the central processing unit monitors the status 
register within the system interface at specified periodic 
intervals. 

28. The system of claim 27 further comprising means for 
reading the bit vector to ascertain the type of system failure. 

29. A system for reporting a failure condition in a server 
system, comprising: 

means for detecting the failure condition; 

means for generating and transmitting failure information 
related to the failure condition to a system recorder; 

means for assigning a time value to the failure informa- 
tion; 

means for storing the failure information and its time 

value into a system log; 
means for reporting an occurrence of an event to a local 

computer coupled to the server system via a remote 

interface; 

means for accessing the system log; and 
means for reading the failure information. 

30. The system of claim 29 wherein the means for 
reporting the occurrence of the event to the local computer 
comprises: 

means for sending an event signal to the remote interface; 
means for setting a bit in a bit vector within the remote 

interface, wherein the setting of the bit corresponds to 

a specified type of system failure; and 
means for notifying the local computer that the event 

signal has been received by the remote interface. 

31. The system of claim 30 wherein the means for 
notifying the local computer comprises means for transmit- 
ting a ready-to-read signal to the local computer, wherein, 
upon receiving the ready-to-read signal, the local computer 
interrogates the remote interface to ascertain that the bit in 
the bit vector has been set. 

32. The system of claim 31 further comprising means for 
notifying a local operator, who is using the local computer, 
of the system failure. 

33. The system of claim 32 wherein the means for 
notifying the local operator comprises means for displaying 
a message on a monitor coupled to the local computer. 

34. A system for reporting a failure condition in a server 
system, comprising: 

means for detecting the failure condition; 

means for generating and transmitting failure information 
related to the failure condition across a control bus 
from a first microcontroller to a system recorder micro- 
controller; 

means for assigning a time value to the failure informa- 
tion; 

means for storing the failure information and its time 
value into a system log; 

means for reporting an occurrence of an event to a remote 
computer coupled to the server system via a remote 
interface, wherein the remote computer is connected to 
the remote interface via a modem connection; 

means for accessing the system log via the system 
recorder microcontroller; and 

means for reading the failure information. 

35. The system of claim 34 wherein the means for 
reporting the occurrence of the event to the remote computer 
comprises: 

means for sending an event signal to the remote interface; 
means for setting a bit in a bit vector within the remote 

interface, wherein the setting of the bit corresponds to 

a specified type of system failure; and 
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means for notifying the remote computer that the event 

signal has been received by the remote interface. 
36. The system of claim 35 wherein the means for 
notifying the remote computer comprises: 
5 means for automatically calling a modem number corre- 
sponding to a modem coupled to the remote computer, 
wherein, upon receiving the call, the remote computer 
interrogates the remote interface to ascertain that the bit 
in the bit vector has been set. 
10 37. The system of claim 36 further comprising: 

means for verifying that the remote computer is autho- 
rized to access the server system via the remote inter- 
face; and 

means for verifying that a communication link has been 
15 established between the remote computer and the 
remote interface. 
38. The system of claim 34 further comprising means for 
notifying a remote operator, who is using the remote 
computer, of the system failure. 
20 39. The system of claim 38 wherein the means for 
notifying the remote operator comprises means for display- 
ing a message on a monitor coupled to the remote computer. 

40. A program storage device storing instructions that 
when executed by a computer perform a method, wherein 

25 the method comprises: 

detecting a system failure condition; 

transmitting failure information related to the failure 

condition to a system recorder; 
storing the failure information in a system log; and 
reporting an occurrence of the failure condition to a 
central processing unit. 

41. The device of claim 40 wherein the method further 
comprises notifying an operator of the system failure. 

42. The device of claim 41 wherein the act of notifying an 
operator comprises displaying a message on a monitor 
coupled to the central processing unit. 

43. The device of claim 41 wherein the method further 
comprises accessing the system log to read the failure 
information from the system log. 

44. The device of claim 40 wherein the method further 
comprises determining when the failure condition occurred 
and storing a representation of when the failure condition 
occurred in the system log. 

45. The device of claim 40 wherein the act of reporting the 
occurrence of the failure condition to the central processing 
unit comprises: 

sending an event signal to a system interface, coupled to 

the central processing unit; 
50 setting a bit in a bit vector within the system interface, 

wherein the setting of the bit corresponds to a specified 

type of system failure; and 
sending an interrupt signal to the central processing unit 

after the bit is set, wherein, upon receiving the interrupt 
55 signal the central processing unit reads a status register 

within the system interface to ascertain that the event 

signal has been received by the system interface. 

46. The device of claim 45 wherein the method further 
comprises reading the bit vector to ascertain a type of event. 

60 47. The device of claim 40 wherein the act of reporting the 
occurrence of the failure condition to the central processing 
unit comprises: 
sending an event signal to a system interface, coupled to 
the central processing unit; 
65 setting a bit in a bit vector within the system interface, 
wherein the setting of the bit corresponds to a specified 
type of system failure; and 
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setting a status of a status register within the system 
interface to indicate the occurrence of the event, 
wherein the central processing unit monitors the status 
register within the system interface at specified periodic 
intervals. 

48. The device of claim 47 wherein the method further 
comprises reading the bit vector to ascertain a type of event. 

49. The device of claim 40 wherein the method further 
comprises reporting the occurrence of the failure condition 
to a local computer connected to server system via a remote 
interface. 

50. The device of claim 49 wherein the act of reporting the 
occurrence of the failure condition to the local computer 
comprises: 

sending an event signal to the remote interface; 

setting a bit in a bit vector within the remote interface, 
wherein the setting of the bit corresponds to a specified 
type of system failure; and 

notifying the local computer that the event signal has been 
received by the remote interface. 

51. The device of claim 50 wherein the act of notifying the 
local computer comprises transmitting a ready-to-read sig- 
nal to the local computer, wherein, upon receiving the 
ready-to-read signal, the local computer interrogates the 
remote interface to ascertain that the bit in the bit vector has 
been set. 

52. The device of claim 51 wherein the method further 
comprises notifying a local operator, who is using the local 
computer, of the system failure. 

53. The device of claim 52 wherein the act of notifying the 
local operator comprises displaying a message on a monitor 
coupled to the local computer. 

54. The device of claim 52 wherein the method further 
comprises accessing the system log through the local com- 
puter to read the failure information. 

55. The device of claim 40 wherein the method further 
comprises reporting the occurrence of the failure condition 



10 



15 



20 



25 



30 



35 



28 



to a remote computer connected to the server system via a 
remote interface, wherein the remote computer is connected 
to the remote interface via a modem-to-modem connection. 

56. The device of claim 55 wherein the act of reporting the 
occurrence of the failure condition to the remote computer 
comprises: 

sending an event signal to the remote interface; 

setting a bit in a bit vector within the remote interface, 
wherein the setting of the bit corresponds to a specified 
type of system failure; and 

notifying the remote computer that the event signal has 
been received by the remote interface. 

57. The device of claim 56 wherein the act of notifying the 
remote computer comprises: 

automatically calling a phone number corresponding to a 
modem coupled to the remote computer, wherein, upon 
receiving the call, the remote computer interrogates the 
remote interface to ascertain that the bit in the bit vector 
has been set. 

58. The device of claim 57 wherein the method further 
comprises: 

verifying that the remote computer is authorized to access 
the server system via the remote interface; and 

verifying that a communication link has been established 
between the remote computer and the remote interface. 

59. The device of claim 57 wherein the method further 
comprises notifying a remote operator, who is using the 
remote computer, of the system failure. 

60. The device of claim 59 wherein the act of notifying the 
remote operator comprises displaying a message on a moni- 
tor coupled to the remote computer. 

61. The device of claim 59 wherein the method further 
comprises accessing the system log through the remote 
computer to read the failure information. 



